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Abstract 

In this paper we present an alternative approach to sym¬ 
bolic segmentation; instead of implementing a new method 
we approach symbolic segmentation as an algorithm selec¬ 
tion problem. That is, let there be n available algorithms for 
symbolic segmentation, a selection mechanism forms a set 
of input features and image attributes and selects on a case 
by case basis the best algorithm. The selection mechanism 
is demonstrated from within an algorithm framework where 
the selection is done in a set of various algorithm networks. 
Two sets of experiments are performed and in both cases we 
demonstrate that the algorithm selection allows to increase 
the result of the symbolic segmentation by a considerable 
amount. 

1. Introduction 

The research field of computer vision contains currently 
several very hard open issues. One of the problems being 
investigated is the problem of the symbolic segmentation; 
in this task the algorithm must segment images into mean¬ 
ingful regions and then detect objects present in the im¬ 
age. Both segmentation and object recognition have been 
extensively studied using various approaches. For instance, 
for segmentation in various contexts several dedicated re¬ 
sources exists [27, 10, 7]. Similarly algorithms for vari¬ 
ous contexts have been developed such as for natural im¬ 
ages [25, 38, 3, 24], for medical images [37, 17, 29, 1] 
or for biological images [2, 28]. The object recognition 
are received even more attention due to very high inter¬ 
est in object recognition from the industry. Some of the 
recent approach to object recognition and detection in¬ 
clude [20, 12, 14, 8, 6]. 

The combination of both segmentation and recognition 


is however more difficult and relatively smaller number of 
studies and approaches have been proposed.For instance 
semantic segmentation has been implemented as a combi¬ 
nation of segmentation and recognition [5], probabilistic 
models [40, 16], convolutional networks [11] or other ap¬ 
proaches for either specific conditions [34], a unified frame¬ 
work [19] or interleaved recognition and segmentation [18]. 
Some of the main difficulties of semantic segmentation are: 

a The segmentation by humans depends on recognition 
and higher level information [42] 

b The recognition is directly depending on features and 
regions from which the features are extracted. 

c The context of the image strongly modulate segmenta¬ 
tion and object recognition. 

Consequently the symbolic segmentation is very complex 
due to the mutual influences of recognition and segmenta¬ 
tion and the designed algorithms have generally high speci¬ 
ficity to some particular features or context. 

As can be seen in computer science and other fields re¬ 
quiring algorithms it happens very often that several algo¬ 
rithms are implemented to solve similar or same problem in 
some varying contexts, environments or different types of 
inputs. The reason for such diversity and specificity is the 
fact that real-world problems are much more complex and 
dynamical than the current state of art software and hard¬ 
ware can handle. Consequently several approaches used the 
algorithm selection approach to improve the algorithms for 
various problems. 

In this paper we propose the algorithm selection ap¬ 
proach to the problem of symbolic segmentation. We base 
our work on the previously proposed platform for algorithm 
selection in [22]. We show that using algorithm selection 
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and high level reasoning about the results of algorithm pro¬ 
cessing allows to iteratively improve result of semantic seg¬ 
mentation. We analyze two different approaches for algo¬ 
rithm selection using either Bayesian Network (BN) or Sup¬ 
port Vector Machine (SVM). The main contributions of this 
paper are: 

1. Analysis of an iterative algorithm selection framework 
in the context of symbolic segmentation 

2. Evaluation of two different machine learning ap¬ 
proaches for semantic segmentation algorithms 

3. Demonstration of the fact that despite the low preci¬ 
sion of the algorithm selector the resulting semantic 
segmentation is improved 

This paper is organized as follows. Section 2 introduces 
related and previous works and Section 3 introduces the al¬ 
gorithm selection framework. Section 4 describes the ex¬ 
perimentation and the results and Section 5 concludes the 
paper and discusses future extensions. 

2. Previous Work and Background 

The algorithm selection have been used previously in the 
area of image processing as well as in certain applications to 
computer vision. The general idea behind the algorithm se¬ 
lection is to select a unique algorithm for a particular set of 
properties, attributes and features extracted from the data or 
obtained prior to processing. The algorithm selection was 
originally proposed by Rice [36] for the problem of oper¬ 
ating system scheduler selection. Since then the algorithm 
selection have been used in various problems but has never 
become a main stream of problem solving. 

The reason for which algorithm selection is not a main¬ 
stream is dual: on one hand it is necessary to find distinctive 
features and on the other hand the problem studied should 
be difficult enough that extracting additional features from 
the input data is computationally advantageous. 

The distinctive features might be too expensive (compu¬ 
tationally) to obtain and thus algorithm selection requires 
the selection of such features that provide the highest qual¬ 
ity of algorithm selection using the least amount of features. 
This idea is illustrated in Figure 1. Figure la shows that 
when features are not well identified the algorithm selection 
does not allow to uniquely determine the best algorithm be¬ 
cause the features are non-distinctive for the available algo¬ 
rithms. Counter example using distinctive features is shown 
in Figure lb. 

The ratio of computational effort that is required to ex¬ 
tract additional features to the whole computation of the re¬ 
sult can be estimated by comparing their respective com¬ 
putational time. In [23] it was shown that for the task of 
image segmentation the algorithm selection is directly pro¬ 
portional to the size of the processed region of the image. 
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Figure 1: Example illustrating (a) non-distinctive and (b) 
distinctive features 


If the region of segmentation is too small, the resulting seg¬ 
mentation of the tested algorithms results in very similar 
f-values and thus selecting fastest/computational least ex¬ 
pensive algorithm. For regions of larger size up to regions 
having the size of the input image, algorithm selection is 
both advantageous due to computational advantages as well 
as due to the increased quality of the result. 

In computer vision and image processing the algorithm 
selection was previously on various levels of algorithmic 
processing. For instance, image segmentation of artifi¬ 
cial [41] or biological images [39] was successfully imple¬ 
mented using algorithm selection approach. A set of fea¬ 
tures was found sufficient and allowed to clearly separate 
the area of performance of different algorithms. These two 
approaches however focused to separate the available al¬ 
gorithms only with respect to noise present in the image. 
Moreover, the algorithms used were single level line detec¬ 
tors such as Canny or the Prewitt. More complex algorithms 
for image segmentations were studied in [21, 23]. Similarly 
to [41, 39] a method using machine learning for algorithm 
selection for the segmentation of natural real-world images 
was developed. Other approaches have been studying the 
parameter selection or improving image processing algo¬ 
rithms using either machine learning or analytical methods 
but their approach is in general contained within a single 
algorithm [15, 33, 35]. 

Methods and algorithms aimed at understanding of real 
world images have in general quite limited extend of their 
application. Currently there is a large amount of work 
combining segmentation and recognition and some of them 
are [16, 5]. In [18] uses an interleaved object recognition 
and segmentation in such manner that the recognition is 
used to seed the segmentation and obtain more precise de¬ 
tected objects contours. In [4] objects are detected by com¬ 
bining part detection and segmentation in order to obtain 
better shapes of objects. More general approaches such 
as [19] build a list of available objects and categories by 
learning them from data samples and reducing them to rel¬ 
evant information using some dictionary tool. However this 
approach does not scale to arbitrary size because the labels 
are not structured and ultimately require complete knowl¬ 
edge of the whole world. 








In [13] uses depth information to estimate whole image 
properties such as occlusions, background and foreground 
isolation and point of view estimation to determine type of 
objects in the image. All the modules of this approach are 
processed in parallel and integrated in a final single step. An 
airport apron analysis is performed in [9] where the authors 
use motion tracking and understanding inspired by cogni¬ 
tive vision techniques. Finally, the image understanding can 
also be approached from a more holistic approach such as 
for instance in [31] where the intent is only to estimate the 
nature of the image and distinguish between mostly natural 
or artificial content. 

3. Algorithm Selection for Symbolic Segmenta¬ 
tion 

The framework used in this experiments was originally 
introduced in [22] . The schematic representation is shown 
in Figure 2. The processing start by extracting features (1) 
from the input image which are used by the algorithm se¬ 
lector (2) to determine the most appropriate algorithm. The 
input image is processed by the selected network of algo¬ 
rithms (3) which results in symbolic segmentation of the 
input image. The symbolic segmentation result is inter¬ 
preted by constructing a multi-relational graph representing 
the high level description. The high-level description is an¬ 
alyzed for symbolic contradiction (5). 
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Figure 2: Algorithm Selection Platform 

The contradiction is obtained using a contradiction 
which is based on co-occurrence statistics obtained from 
training data. If a contradiction is detected a new hypothe¬ 
sis about a region containing a contradiction is generated by 
the largest co-occurrence statistics given the symbolic seg¬ 
mentation for all but one regions being fixed. Once the new 
hypothesis is generated it is used as an additional input to 
the algorithm selector. Finally, features are extracted from 
the region of the contradiction. This new set of features val¬ 
ues and hypothesis attributes are used for a new algorithm 


selection. 

The newly selected algorithm processes the whole image 
and generates a new symbolic segmentation. The region 
that before contained the contradiction is now extracted and 
is merged with the original high-level description (4). The 
new high-level description is analyzed and the cycle begins 
again. The processing stops when for a given input there 
are no more contradictions or when no more algorithms can 
be selected. This amounts to either have no more errors in 
the high level description or when no more new hypotheses 
can be generated. This platform will be referred to as Itera¬ 
tive Analysis (lA) as it incrementally changes the high level 
description of the input image. 

The output of symbolic segmentation algorithm is a set 
of labeled regions. The high-level interpretation (descrip¬ 
tion) consists of building a multi-relational graph, that spec¬ 
ifies relations between the labeled regions in the resulting 
image. Using this graph the result is checked for contradic¬ 
tion and a hypothesis about the recognized objects’ relations 
is generated if necessary. Currently, the lA platform uses 
co-occurrence statistics obtained from training data to esti¬ 
mate the contradiction and to propose most viable hypoth¬ 
esis. The estimated relations are the relative position (left, 
right, above, below), relative size (larger, smaller, same), 
background/foreground (in front, back) and one single ob¬ 
ject property which is the shape (Hough transform). Each 
of these properties are applied to either a pair of objects 
or individual objects and the probability of contradiction is 
generated as a cumulative normalized product of all individ¬ 
ual scores. An example of lA processing an image is shown 
in Figure 3. 

The verification is intended as an additional source of 
information; the reasoning over the recognized regions is 
performed only on relational level and thus only if two or 
more regions are detected our method is applicable. 

4. Experiments 

To evaluate the proposed framework we used the 
VOC2012 data and three different algorithms for symbolic 
segmentation [16, 5, 11]. Each of the algorithms use simi¬ 
lar or none preprocessing, different segmentation and simi¬ 
lar classification machine learning based object recognition. 
All three algorithms have been evaluated and tested on the 
VOC2012 data set. 

As introduced in Section 3 the high level verification 
requires multiple objects detections in one image. Conse¬ 
quently the testing and the training of the lA platform was 
carried only on images that contain more than one distinct 
object. The training set requires that not only the input con¬ 
tains more than one objects in the ground truth but also that 
at least one of the algorithms used is able to detect at least 
two objects in the image. Tailing to do so the verification 
procedure will not be triggered and the iterative process of 


















high level understanding improvement could not be started. 
The experiments are carried over various features’ set and 
terminating conditions. We evaluate two different algorithm 
selection algorithms: a Bayesian Network (BN) and support 
vector machine (SVM). The motivation for using these two 
different methods is one hand given by the ability of using 
hierarchy of information and thus to reduce the complex¬ 
ity of learning and on the other hand the simplicity and in 
general good learning results of BN and SVM respectively. 
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Figure 3: Exemplar processing of an input image by the IA 
platform 




(a) Input/Original Image (b) Ground Truth 
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4.1. Training of the Algorithm Selector 

For SVM algorithm selection two SVM are trained: one 
for the selection of algorithms from image features only 
SYMf and one for algorithm selection using features and 
hypothesis attributes SVM^. Such approach is used as a 
solution to the problem of missing values in the inputs of 
SVM [32] and is one of the possible solutions [26]. Ini¬ 
tially two separate SVM machines have been used: one for 
the initial algorithm selection using only image features and 
another one for selection using features and hypothesis at¬ 
tributes. However it was shown experimentally that patch¬ 
ing approach [26] outperformed the two separate SVMs. 
Using the patching approach, whenever the attributes of an 
image could not be obtained (hypithesis was not generated, 
or it is unknown) the attributes values were generated by the 
average of the available values. 

The first training data set Tf is equivalent to the 
VOC2012 training data set. In the case of SVMj only 
features are extracted. The feature vector contains all to¬ 
gether 7856 feature values composed from histograms of 
various features. The features used are brightness, fft, ga- 
bor, wavelets, rgb intensity, acutance, and so on. The sec¬ 
ond training data set Ta is created from bounding boxes 
of around the semantic segmentations in the training set of 
VOC2012 data set. Same features as in T/ but additionally 
a set of attributes extracted from the region corresponding to 
the region of the correct semantic segmentation is extracted 
using the Matlab regionprops function. 

In the case of the BN only Ta is used for training as the 
BN is well suited to handle missing input values. However 
the BN approach requires deterministic input values - obser¬ 
vations. Because most of the features extracted are contin¬ 
uous values within a certain range it is necessary to cluster 
the data to discrete values. The clusterization is done using 



































































an equivalent ranges for each value given by (1). 

Vi =\{maxf —minf)/k^ (i — 1), {maxf —minf)/k^ (i)] 

( 1 ) 

The BN structure is shown in Figure 4 and the inputs 
are specified by three categories: application specifications, 
hypothesis attributes and image features. The application 



Figure 4: Bayesian Network used for Algorithm Selection 
in some experiments 

specification represents input information about the target 
application and other application related information that 
are constant in the framework of this study. The attributes 
are regional properties extracted regionprops command in 
Matlab and represent the attributes for each of the available 
hypotheses. The hypotheses are the available labels for used 
in region labeling. Here the labels corresponds to the 20 
classes and a background from the VOC database. Each 
attribute for a class is calculated as an average of the values 
extracted from all objects of that class encountered in the 
training data set. The extracted features from the image are 
together with the attributes clustered as described in next 
subsection. 

Both the training and the testing data however are fairly 
imbalance as can be seen in Table 1 . 

Table 1: The distribution of samples representing each of 
the algorithms 


ALE [16] 

COMP6 [11] 

CPMC [5] 

35% 

42% 

23% 

964 

1133 

633 


The creation of the training sets follows different prin¬ 
ciples depending whether the training set is Tf or T^. 
For the Tf data set, each sample image is evaluated as 
^ ScgCj with Cl being all labels present in the ground 
truth of image I, and Fc is the f-value of the symbolic seg¬ 
mentation of class c in image I. In Ta the evaluation of each 
algorithm is done only with respect to the region represent¬ 
ing a single label fully enclosed in the bounding box pro¬ 
vided by the VOCdevkit. 

Finally, the experimental results have shown that us¬ 
ing all data for learning the algorithm selection is not well 


suited because many images have relatively close results of 
processing by more than one algorithm. Let, Ij be an input 
image and Fnj are f-values calculated on the output of each 
algorithm n applied to J^, let = {F^ > F^ > ... > 
F^} be the ordered set of Fnj then a Ij is used for learning 

if |i?o _^l| 

> ^. In most of the experiments in this paper 
0 was set to 0.5. 

4.2. Testing of the Platform 

The testing of the system was done over a subset of im¬ 
ages from the VOC2012 validation data set; images that 
contain at least two objects in the ground truth. At first 
we evaluate the algorithm selector ability to learn to clas¬ 
sify the images according to which algorithm results in best 
symbolic segmentation. To evaluate the classification power 
of both algorithms we analyzed results both for binary clas¬ 
sification (with two different algorithms for semantic seg¬ 
mentation) and for multi-class classification (using all three 
available semantic segmenters). Then the whole system is 
analyzed by looking at the resulting data. 

First we evaluated the BN for various levels of data clus¬ 
tering. Intuitively, the size of the BN is directly and in¬ 
versely proportional to the number of values on the in¬ 
put observations; the conditional probabilities tables in the 
nodes where the observable inputs are connected to grows 
according to k'^ with k being the number of observable val¬ 
ues of the input variables and n being the number of input 
variables connected to this node. The experimentation using 
the BN was carried in Matlab using the BNT [30] package 
and the learning of the BN was performed using the EM 
algorithm. The results of evaluating the BN classification 
power on the data set with respect to the number of ob¬ 
served data values is shown in Table 2 Notice that for /c = 7 

Table 2: BN classification results with respect to the num¬ 
ber of data values 


Clusters 

BN Classification Error 

3 

53% 

5 

53% 

6 

42% 

7 

92% 


the EM algorithm used for BN learning results in very high 
error rate of classification and for any k > 7 the EM does 
not converge. The BN is fairly limited in the number of in¬ 
put nodes as well. Because the conditional probability table 
in each node of the BN grows using 1. Consequently using 
the BNT Matlab package we were able to experiment with 
a BN having at best 10 sextenary input feature variables. 

Because the BN requires the best features for high qual¬ 
ity of classification we performed two different experiments 
of classification with BN: (a) search for best features for BN 


















and (b) using clustered PC A features. The results using the 
Ta data set are shown in Table 3. Contrary to the BN the 

Table 3: BN classification results 


Task 

Clusters 

Number of Features 

BN Error 

2-class 

3 

8 PCA 

50% 

2-class 

3 

5 

49% 

3-class 

2 

21 

48% 

3-class 

3 

11 PCA 

49% 


SVM uses continuous features values and only normaliza¬ 
tion is required. Moreover SVM works well with large input 
vectors that are in general reduced using PCA for increased 
speed and accuracy of classification. The results of testing 
of the SVM classification using the Tf and Ta training data 
is shown in Table 4. The evaluation was done using two 
data sets; one data set contained image regions (bounding 
boxes with individual semantic segmentations) and another 
data set contained full images (denoted FI in Table 4). The 


Table 4: SVM classification results using the 3 most signif¬ 
icant PCA features. (FI) means that the training and testing 
data is using whole images. 


Task 

Data set 

SVM Classification Error 

2-class 

Train 

25% 

2-class 

Test Ta 

27% 

2-class 

Train Tf 

34% 

2-class 

Test Tf 

37% 

3-class 

Train Ta 

47% 

3-class 

Test Ta 

51% 

3-class 

Train Tf 

50% 

3-class 

Test Tf 

53% 

3-class 

(FI)TrainT, 

42% 

3-class 

(FI)TestT, 

46% 

3-class 

(FI)Train Tf 

47% 

3-class 

(FI)Test Tf 

54% 


main result that can be seen in Table 4 is that the error rate 
on classification is significantly smaller than when the SVM 
is using only features. Moreover all experiments where no 
attributes are used, the SVM is given mean values of the at¬ 
tributes. When the SVM was used completely without the 
attributes and was trained exclusively in he features the re¬ 
sults have even lower accuracy of selection. Consequently 
all experiments on the lA platform were done using a single 
SVM that was either given only features and mean values 
of attributes or features and hypothesis attributes. More¬ 
over, as can be expected the error rate of classification is 
significantly lower for two algorithms. 

We can see that both the learning of the whole images 
as well as the learning of segments performs relatively poor 
with both the SVM and the BN. However the lA platform 


uses high level verification and thus it was tested with the 
best of the algorithm selector, the SVM. 

To evaluate the lA platform data from the VOC2012 
trainval set was used. The average precision of the of 
the three algorithms and the iterative analysis approach is 
shown in Table 5. 

Table 5: The f-measures for all three algorithms and for the 
presented approach 


Algorithm 

average f-value 

COMP6 

44.469% 

lA 

43.554% 

ALE 

43.144% 

CPMC 

32.060% 


Some examples of processing are shown in Figure 5. No¬ 
tice that despite the low accuracy a number of images are 
improved by selecting the regions from each algorithm. 

To see how well the lA approach is performing we com¬ 
pare the average precision of each category of class. Com¬ 
parison of each algorithm’s results is shown in Table 6. As 
can be seen due to the low level of learning our lA frame¬ 
work outperformed the highest classes precision only in 
three classes of objects: the boat, bus and dog. For the 
rest of the categories the lA approach was able to outper¬ 
form most of the algorithms but one. This is due to the 
fact that the selection accuracy is relatively low. Notice 
that according to the schematic of the lA platform the low 
accuracy of the algorithm selector could be compensated 
by a stronger verification and reasoning mechanism. Con¬ 
sider the third row in Figure 5. A better reasoning proce¬ 
dure would lead to a result as shown in the hypothetical and 
ideal case shown in Figure 3 rather to the result shown in 
the last column of the third row in Figure 5. The simplest 
heuristics that would prevent replacing regions directly re¬ 
ducing the f-value could increase the overall result without 
any significant computational overhead. Similar heuristics 
for improbable regions removal can also be implemented 
in parallel to the co-occurrence statistics. Thus even a rel¬ 
atively inaccurate algorithm selection with combined with 
simple high level verification would lead to better results. 

5. Conclusion 

In this paper we introduced a soft computing approach to 
the semantic segmentation problem. The method is based 
on an algorithm selection platform with the target to in¬ 
crease the quality of the result by reasoning on the content 
of algorithms outputs. The IA platform for image under¬ 
standing iteratively improves the high level understanding 
and even with a very weak algorithm selector can outper¬ 
form in many cases the best algorithm by combining the 
best results of each available algorithm. 
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Figure 5: Selected results form the lA platform. Each row represents one particular input. Column (1) shows the input image, 
column (2) shows the human generated ground truth, columns (3)-(5) shows the results of the three available algorithms in 
order [16, 11,5] and last column shows the result obtained by lA platform. 


In the future several direct extensions and improvements 
are planned to the lA platform. First the algorithm selection 
accuracy must be improved. Second the high level verifi¬ 
cation also requires a more robust method of contradiction 
detection and hypothesis generation. Co-occurrence statis¬ 


tics are not sufficient because their dependence on the train¬ 
ing data. Finally the result merging requires more fiexible 
and robust mechanism in order to avoid decrease in result 
quality. 


































Table 6 


Class 

Algorithms 


lA 

CPMC 

ALE 

COMP6 

background 

62.554% 

76.430% 

56.020% 

80.248% 

aeroplane 

78.292% 


61.750% 

78.292% 

bicycle 

27.752% 

13.124% 

27.221% 

32.228% 

bird 

15.190% 

28.443% 

13.318% 

25.932% 

boat 

36.701% 

30.934% 

36.692% 

32.400% 

bottle 

44.317% 

41.224% 

40.131% 

56.265% 

bus 

75.846% 

51.433% 

74.244% 

72.300% 

car 

49.148% 

28.672% 

49.696% 

39.259% 

cat 

60.457% 

58.134% 

64.043% 

58.910% 

chair 

14.664% 

3.885% 

19.565% 

18.576% 

cow 

30.195% 


31.334% 

2.943% 

diningtable 

38.848% 

25.049% 

38.010% 

54.087% 

dog 

49.950% 

38.775% 

49.949% 

41.761% 

horse 

39.941% 

29.805% 

45.293% 

51.031% 

motorbike 

50.562% 

39.305% 

50.991% 

34.006% 

person 

44.531% 

43.712% 

42.786% 

61.879% 

pottedplant 

22.978% 

22.065% 

27.603% 

38.502% 

sheep 

67.456% 

39.326% 

72.930% 

71.171% 

sofa 

26.792% 

24.301% 

28.751% 

16.612% 

train 

46.152% 

32.323% 

46.195% 

4.945% 

tvmonitor 

32.310% 

46.323% 

29.512% 

62.495% 

Average 

43.554% 

32.060% 

43.144% 

44.469% 
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